Data farm: Information system for collecting, storing and processing unstructured data from heterogeneous sources
نویسندگان
چکیده
The original information system «data farm» is presented. Today, the successful application of artificial intelligence algorithms, primarily deep learning based on neural networks, almost completely depends availability data. And larger amount these data (big data), better are results algorithms execution. There well-known examples such from Facebook, Google, Microsoft, Yandex, etc. must contain both training sample and test one. Moreover, be good quality have a certain structure, ideally, labeled in order for to work adequately. This serious problem requiring huge computational human resources. paper dedicated solve this problem. Today farm rather complex built modular basis, similar Lego constructor. Separate modules various modern technologies entire libraries intelligence, all together they designed automate process obtaining structuring high-quality big subject domains. has been tested COVID-19 regions Russia countries around world. In addition, user-friendly interface visualizing collected processed was developed. makes it possible conduct visual numerical experiments computer simulation compare them with real data, turning into an intelligent decision support system.
منابع مشابه
A Realization of an Automated Data Flow for Data Collecting , Processing , Storing and Retrieving
GEONET is a database system developed at the Stanford Linear Accelerator Center for the alignment of the Stanford Linear Collider. It features an automated data flow, ranging from data collection using HP110 handheid computers to processing, storing and retrieving data and finally to adjusted coordinates. This paper gives a brief introduction to the SLC project and the applied survey methods. I...
متن کاملInformation Integration for Heterogeneous Data Sources
Information Retrieval from heterogeneous information systems is required but challenging at the same as data is stored and represented in different data models in different information systems.Information integrated from heterogeneous data sources into single data source are faced upon by major challenge of information transformationwere in different formats and constraints in data transformati...
متن کاملIntegrating and Processing Events from Heterogeneous Data Sources
Environmental monitoring studies present many challenges. A huge amount of data are provided in different formats from different sources (e.g. sensor networks and databases). This paper presents a framework we have developed to overcome some of these problems, based on combining aspects of Enterprise Service Bus (ESB) architectures and Event Processing mechanisms. First, we treat integration us...
متن کاملCreating Relational Data from Unstructured and Ungrammatical Data Sources
In order for agents to act on behalf of users, they will have to retrieve and integrate vast amounts of textual data on the World Wide Web. However, much of the useful data on the Web is neither grammatical nor formally structured, making querying difficult. Examples of these types of data sources are online classifieds like Craigslist and auction item listings like eBay. We call this unstructu...
متن کاملOntology-based information extraction and integration from heterogeneous data sources
In this paper we present the design, implementation and evaluation of SOBA, a system for ontology-based information extraction from heterogeneous data resources, including plain text, tables and image captions. SOBA is capable of processing structured information, text and image captions to extract information and integrate it into a coherent knowledge base. To establish coherence, SOBA interli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Trudy Instituta sistemnogo programmirovaniâ
سال: 2023
ISSN: ['2079-8156', '2220-6426']
DOI: https://doi.org/10.15514/ispras-2023-35(2)-5